The analysis was performed on the dataset: Right Heart Catheterization (RHC) Dataset, first analysed Connors (et. al) (1996)
Before cleaning and augmentation:
5735 patiens
62 attributes
After cleaning and augmentation:
5612 patiens
53 attributes
Contains patient-, socioeconomic-, physiological-, disease-, and survival information.
We performed our analysis using \(\color{red}{\text{Tidyverse}}\).
rhc_aug |> mutate(sex = factor(sex),
swang1 = factor(swang1),
death = factor(x = death, levels = c(0,1), c("Alive","Dead"))) |>
table1(x = formula(~ sex + age + race + swang1 | death),
data = _)| Alive (N=1972) |
Dead (N=3640) |
Overall (N=5612) |
|
|---|---|---|---|
| sex | |||
| Female | 906 (45.9%) | 1594 (43.8%) | 2500 (44.5%) |
| Male | 1066 (54.1%) | 2046 (56.2%) | 3112 (55.5%) |
| age | |||
| Mean (SD) | 56.6 (17.4) | 64.0 (15.7) | 61.4 (16.7) |
| Median [Min, Max] | 58.0 [18.0, 102] | 66.0 [18.0, 101] | 64.0 [18.0, 102] |
| race | |||
| black | 323 (16.4%) | 577 (15.9%) | 900 (16.0%) |
| other | 121 (6.1%) | 223 (6.1%) | 344 (6.1%) |
| white | 1528 (77.5%) | 2840 (78.0%) | 4368 (77.8%) |
| swang1 | |||
| 0 | 1291 (65.5%) | 2177 (59.8%) | 3468 (61.8%) |
| 1 | 681 (34.5%) | 1463 (40.2%) | 2144 (38.2%) |
#:::: {.columns}
::: {.column width=“50%”} ::: {.column width=“50%”}
::: {.column width=“50%”} Violin plot \(\rightarrow\) Multimodal / Bimodal distribution
::: {.column width=“50%”}
:::: {.columns}
:::: {.columns}
:::: {.columns}
More beautiful plots by Emilie
How come we found no major discoveries?
What could have been done differently?
We can conclude that PC can make sense for further analysis.
We can conclude that high values of APS for several diagnosis, will increase the risk of death
R for Bio Data Science